-
Notifications
You must be signed in to change notification settings - Fork 58
kohya-ss lora support #295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kohya-ss lora support #295
Conversation
3570f32
to
a8cd34e
Compare
It looked like I had pushed up half broken code. Should now be in a working state. I'm not sure why the global |
Work was started to enable reusing the existing transformer, which would have shaved off ~30 seconds per generation on my local if there were no changes to base model or lora weights, but it would have required the additional work that I do not currently have the time to put in. |
4b4c2e1
to
b130f16
Compare
@colinurbs I went ahead and hacked in a manager to enable reuse of the existing transformer when there are no changes to the model or any weights. Without this additional change there is around 30-45 seconds of load time for the LoRA's while using kohya_ss's implementation. It's pretty nice so far. These were generated at 256x256 for testing, but the lora's seem to be performing some heavy lifting. 250707_231212_812_2865_9.mp4250707_231155_707_4831_9.mp4 |
This is fantastic work, thank you so much. I see you've left it as a draft. Is there any reason I shouldn't merge this into develop and start testing it? |
@colinurbs no reason from my side to not merge into develop, I'll remove the draft status. |
590cb22
to
1e6a32c
Compare
modules/pipelines/worker.py
Outdated
|
||
if current_generator is not None and current_generator.transformer is not None: | ||
offload_model_from_device_for_memory_preservation( | ||
current_generator.transformer, target_device=gpu, preserved_memory_gb=settings.get("gpu_memory_preservation", 8.0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the inherited FP demo code from Illyasviel, this was explicitly preserved at 8GB, unlike most preservations which used his Setting, which defaulted to 6GB. I don't know if this change is an issue, but it should be tested. (There's a second similar change below.)
modules/pipelines/worker.py
Outdated
offload_model_from_device_for_memory_preservation(studio_module.current_generator.transformer, target_device=gpu, preserved_memory_gb=8) | ||
current_generator.move_lora_adapters_to_device(cpu) | ||
offload_model_from_device_for_memory_preservation( | ||
current_generator.transformer, target_device=gpu, preserved_memory_gb=settings.get("gpu_memory_preservation", 8.0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the inherited FP demo code from Illyasviel, this was explicitly preserved at 8GB, unlike most preservations which used his Setting, which defaulted to 6GB. I don't know if this change is an issue, but it should be tested. (There's a second similar change above.)
modules/pipelines/worker.py
Outdated
|
||
studio_manager.current_generator = current_generator = new_generator | ||
# Load the transformer model | ||
current_generator.load_model() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear why we load here without moving the transformer to the gpu, but on 296 a pre-existing transformer does go to the gpu.
modules/pipelines/worker.py
Outdated
f"Worker: AFTER model assignment, current_generator is {type(current_generator)}, id: {id(current_generator)}") | ||
if current_generator: | ||
print( | ||
f"Worker: current_generator.transformer is {type(current_generator.transformer)}. load_model() will be called next.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The model is already loaded above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was just some print debugging. I'll remove the noise.
Keeps download model files even when it already found them...: |
The first commit of this has been merged into develop and seems to be working well so far. |
This is loading the model from disk, not downloading. The progress bar is a function of tqdm. |
@colinurbs you were just a bit too quick for me :D I was working on minimizing the changes and exposing the settings in https://github.com/arledesma/temp-FramePack-Studio/commits/feature/kohya-ss-loraready which is from a couple of hours ago, around the time that you merged the commit to develop. I have https://github.com/arledesma/temp-FramePack-Studio/commits/feature/kohya-ss-loraready-develop/ merged with the current develop. Providing the choice for the loader and model reuse in the settings page seemed like an easier A/B test. ![]() Regardless, I'm good with anything that you want to do over here. 👍🏽 |
https://github.com/arledesma/temp-FramePack-Studio/commits/feature/kohya-ss-loraready-develop/ is currently unloading the loras when reusing the model transformer, so only the first generation will apply lora weights. Definitely not in a desirable state. |
@arledesma thanks for this. I'm going to roll back develop and merge this whole thing in this evening. I haven't had time to dig into the code, did you add an exception for already downloaded models to prevent it from forcing existing users to re-download? |
@colinurbs I have yet to replicate any models being re-downloaded. I've attempted multiple times with different settings and even reinstalled the entire repository without experiencing the issue. Could it just be what is mentioned in #295 (comment) where it is loading from disk being misinterpreted as downloading? If so then maybe we can maybe update the wording in lora_utils.load_safetensors_with_fp8_optimization() with something other than def load_safetensors_with_fp8_optimization(
model_files: list[str],
fp8_optimization: bool,
device: torch.device,
weight_hook: Callable | None = None,
) -> dict[str, torch.Tensor]:
"""
Load state dict from safetensors files and merge LoRA weights into the state dict with fp8 optimization if needed.
"""
state_dict = {}
if fp8_optimization:
raise RuntimeWarning("FP8 optimization is not yet supported in this version.")
from .fp8_optimization_utils import (
optimize_state_dict_with_fp8_on_the_fly,
)
# Optimization targets and exclusion keys
TARGET_KEYS = ["transformer_blocks", "single_transformer_blocks"]
EXCLUDE_KEYS = [
"norm"
] # Exclude norm layers (e.g., LayerNorm, RMSNorm) from FP8
print(f"FP8: Optimizing state dictionary on the fly")
# Optimized state dictionary in FP8 format
state_dict = optimize_state_dict_with_fp8_on_the_fly(
model_files,
device,
TARGET_KEYS,
EXCLUDE_KEYS,
move_to_device=False,
weight_hook=weight_hook,
)
else:
from .safetensors_utils import MemoryEfficientSafeOpen
state_dict = {}
for model_file in model_files:
with MemoryEfficientSafeOpen(model_file) as f:
for key in tqdm(
f.keys(),
desc=f"Loading {os.path.basename(model_file)}",
leave=False,
):
value = f.get_tensor(key)
if weight_hook is not None:
value = weight_hook(key, value)
state_dict[key] = value
return state_dict |
Ok, I'll try this again tonight and see if it happens for me. I believe it's due to the slightly different structure being used in the hf cache folder. But I'll let you know. |
@arledesma see my long message on discord #testers last night (that I @'d you on) for a detailed description of why this change in studio.py caused me re-downloads: https://github.com/FP-Studio/framepack-studio/pull/295/files#diff-05934289eba73cfacb716666819e900c0a2212ad0c7952f5cbb014617b3b739bR26 |
Ahh, I see. It's almost odd that users with an explicitly set HF_HOME were not demanding that to be set and patching studio.py with each change. I had been manually adding it each time that I pulled and it just got lost in the original PR (as I didn't have the time to finish the work and pushed everything in my local). Maybe, in the future, we can just add a cli flag to just skip setting the environment variable and let the sdk define how it behaves. |
Brings support from kohya-ss implementations in their FramePack-LoraReady fork as well as their contributions to FramePack-eichi (that do not seem to be correctly attributed to kohya-ss in thei primary eichi repo) https://gist.github.com/kohya-ss/fa4b7ae7119c10850ae7d70c90a59277 https://github.com/kohya-ss/FramePack-LoRAReady/blob/3613b67366b0bbf4a719c85ba9c3954e075e0e57 https://github.com/kohya-ss/FramePack-eichi/blob/4085a24baf08d6f1c25e2de06f376c3fc132a470
We do not manage multiple VideoJobQueue's, so this singleton can be imported and used anywhere that we need access to the queue
Enable switching between known lora loaders Includes StrEnum implementation for python 3.10 (or older 3.x) users, otherwise use the builtin StrEnum from python >= 3.11
We do not manage multiple Settings objects, so this singleton can be imported and used anywhere that we need access to the Settings
These are defaulted to continue existing behavior. diffusers lora loader and no reuse of model instance
…ransformer3DModel This was existing behavior that was found to not be mapped.
This was leading to the model being downloaded again for users that did have HF_HOME set to a value. We will need to document the migration path for existing users to avoid redownloading the entire models again.
Pinokio users (like me) have it set for them (via the Pinokio ENVIRONMENT) without even realizing it. It's possible some others have set it for other AI projects and wouldn't even realize we use it. I know some users who have multiple installs have mentioned using junction links to avoid multiple model copies on the discord. But I think it just hasn't been on that many people's radars.
That sounds reasonable, but it's a little outside my lane to actually weigh in unless I learn more about the options and conventions. Whatever we do, we just need to take care that legacy users don't download another 80GB (and maybe not even know if or where there are duplicates to delete). |
When loaded from a queue import the selected_loras and lora_values are equal in length. When loaded from the Generate interface the lora_values are the same length as the lora_loaded_names and must be reduced. This change now makes the assumption that when the two lists are the same length that they are in the correct order.
diffusers uses this environment variable to automatically downloads files on import. weird side effect to do that amount of actual work on import.
b044cb2
to
a8cba73
Compare
So to sum op testing for now:
In settings needs Experimental settings, Lora loader on lora_ready (might want to make this default)
|
Evaluating if we should default to the new loader. Pros:
Cons:
I'm sure that there are additional pros and cons. |
Brings support from kohya-ss implementations in their FramePack-LoraReady fork as well as their contributions to FramePack-eichi
https://gist.github.com/kohya-ss/fa4b7ae7119c10850ae7d70c90a59277
https://github.com/kohya-ss/FramePack-LoRAReady/blob/3613b67366b0bbf4a719c85ba9c3954e075e0e57
https://github.com/kohya-ss/FramePack-eichi/blob/4085a24baf08d6f1c25e2de06f376c3fc132a470
Features
diffusers
lora_ready
UI
diffusers
lora_ready
Additional
LoRA's like https://civitai.com/models/1518315/transporter-effect-from-star-trek-the-next-generation-or-hunyuan-video-lora are functioning with this implementation, while failing with the existing (diffusers) implementation.
Under the covers
LoRA keys are named with a prefix of
lora_unet_
.The lora name replaces any
.
with a_
.The keys then have a format of
lora_unet_{lora_name}
.The lora loader iterates through the lora dictionary keys to rename keys to a consistent naming.
diffusion_model
ortransfomer
AND end withlora_A
then it is converted withconvert_from_diffusion_pipe_or_something
double_blocks
orsingle_blocks
then it is identified as hunyan and converted withconvert_hunyan_to_framepack
convert_from_diffusion_pipe_or_something
renamesdiffusion_model
keys from_lora_A_
to.lora_down.
, and from_lora_B_
to.lora_up.
convert_from_diffusion_pipe_or_something
assigns the first dimension of thelora_down
to the lora weightconvert_from_diffusion_pipe_or_something
adds the alpha with rank to the weightconvert_hunyan_to_framepack
renames and splits keys indouble_blocks
andsingle_blocks
convert_hunyan_to_framepack
splits up and slices QKV keys (or QKVM keys) into individual Q,K,V (or Q,K,V,M keys)fp8 is out of scope - has been set to raise an exception when used to alert future development